30 research outputs found
Non-abelian Littlewood-Offord inequalities
In 1943, Littlewood and Offord proved the first anti-concentration result for
sums of independent random variables. Their result has since then been
strengthened and generalized by generations of researchers, with applications
in several areas of mathematics.
In this paper, we present the first non-abelian analogue of Littlewood-Offord
result, a sharp anti-concentration inequality for products of independent
random variables.Comment: 14 pages Second version. Dependence of the upper bound on the matrix
size in the main results has been remove
Deep Network for Simultaneous Decomposition and Classification in UWB-SAR Imagery
Classifying buried and obscured targets of interest from other natural and
manmade clutter objects in the scene is an important problem for the U.S. Army.
Targets of interest are often represented by signals captured using
low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar
(SAR) technology. This technology has been used in various applications,
including ground penetration and sensing-through-the-wall. However, the
technology still faces a significant issues regarding low-resolution SAR
imagery in this particular frequency band, low radar cross sections (RCS),
small objects compared to radar signal wavelengths, and heavy interference. The
classification problem has been firstly, and partially, addressed by sparse
representation-based classification (SRC) method which can extract noise from
signals and exploit the cross-channel information. Despite providing potential
results, SRC-related methods have drawbacks in representing nonlinear relations
and dealing with larger training sets. In this paper, we propose a Simultaneous
Decomposition and Classification Network (SDCN) to alleviate noise inferences
and enhance classification accuracy. The network contains two jointly trained
sub-networks: the decomposition sub-network handles denoising, while the
classification sub-network discriminates targets from confusers. Experimental
results show significant improvements over a network without decomposition and
SRC-related methods
DFDL: Discriminative Feature-oriented Dictionary Learning for Histopathological Image Classification
In histopathological image analysis, feature extraction for classification is
a challenging task due to the diversity of histology features suitable for each
problem as well as presence of rich geometrical structure. In this paper, we
propose an automatic feature discovery framework for extracting discriminative
class-specific features and present a low-complexity method for classification
and disease grading in histopathology. Essentially, our Discriminative
Feature-oriented Dictionary Learning (DFDL) method learns class-specific
features which are suitable for representing samples from the same class while
are poorly capable of representing samples from other classes. Experiments on
three challenging real-world image databases: 1) histopathological images of
intraductal breast lesions, 2) mammalian lung images provided by the Animal
Diagnostics Lab (ADL) at Pennsylvania State University, and 3) brain tumor
images from The Cancer Genome Atlas (TCGA) database, show the significance of
DFDL model in a variety problems over state-of-the-art methodsComment: Accepted to IEEE International Symposium on Biomedical Imaging
(ISBI), 201
Efficient Finetuning Large Language Models For Vietnamese Chatbot
Large language models (LLMs), such as GPT-4, PaLM, and LLaMa, have been shown
to achieve remarkable performance across a variety of natural language tasks.
Recent advancements in instruction tuning bring LLMs with ability in following
user's instructions and producing human-like responses. However, the high costs
associated with training and implementing LLMs pose challenges to academic
research. Furthermore, the availability of pretrained LLMs and instruction-tune
datasets for Vietnamese language is limited. To tackle these concerns, we
leverage large-scale instruction-following datasets from open-source projects,
namely Alpaca, GPT4All, and Chat-Doctor, which cover general domain and
specific medical domain. To the best of our knowledge, these are the first
instructional dataset for Vietnamese. Subsequently, we utilize
parameter-efficient tuning through Low-Rank Adaptation (LoRA) on two open LLMs:
Bloomz (Multilingual) and GPTJ-6B (Vietnamese), resulting four models:
Bloomz-Chat, Bloomz-Doctor, GPTJ-Chat, GPTJ-Doctor.Finally, we assess the
effectiveness of our methodology on a per-sample basis, taking into
consideration the helpfulness, relevance, accuracy, level of detail in their
responses. This evaluation process entails the utilization of GPT-4 as an
automated scoring mechanism. Despite utilizing a low-cost setup, our method
demonstrates about 20-30\% improvement over the original models in our
evaluation tasks.Comment: arXiv admin note: text overlap with arXiv:2304.08177,
arXiv:2303.16199 by other author
COPPER-MODIFIED MCM-22 AS CATALYSTS FOR HYDROCARBON SELECTIVE CATALYTIC REDUCTION OF NOX
Joint Research on Environmental Science and Technology for the Eart
Few-Shot Object Detection via Synthetic Features with Optimal Transport
Few-shot object detection aims to simultaneously localize and classify the
objects in an image with limited training samples. However, most existing
few-shot object detection methods focus on extracting the features of a few
samples of novel classes that lack diversity. Hence, they may not be sufficient
to capture the data distribution. To address that limitation, in this paper, we
propose a novel approach in which we train a generator to generate synthetic
data for novel classes. Still, directly training a generator on the novel class
is not effective due to the lack of novel data. To overcome that issue, we
leverage the large-scale dataset of base classes. Our overarching goal is to
train a generator that captures the data variations of the base dataset. We
then transform the captured variations into novel classes by generating
synthetic data with the trained generator. To encourage the generator to
capture data variations on base classes, we propose to train the generator with
an optimal transport loss that minimizes the optimal transport distance between
the distributions of real and synthetic data. Extensive experiments on two
benchmark datasets demonstrate that the proposed method outperforms the state
of the art. Source code will be available